Text analysis
This section is devoted to describe the performed text analysis, and display the main results.
This section regarding text analysis is divided into two parts: namely wordclouds and sentiment analysis. Both the extracted wiki pages and the character dialogoues will be used and it will be investigated how wordclouds and sentiment analysis will differ based on the two different data sets.
Wordclouds
First, we will take a look at word clouds. As mentioned before, both the extracted wiki pages and the full series dialogoue will be investigated. We will start by generating wordclouds for characters of interest. Here, we have selected the characters: Jon Snow, Arya Stark, Bronn, Brienne of Tarth and Jaime Lannister. The first step in generating the wordclouds is to compute the term frequeny-inverse document frequency (TF-IDF) for our respective text corpus, i.e. the wiki pages and episode dialogoues. For further explanation of the TF-IDF and it's computation we refer to the Explainer Notebook. It should be mentioned that we have removed all characters' names from the text corpus as these would not be very decriptive of the character in a wordcloud or during sentiment analysis.
Now, let's take a look at the generated wordclouds for the selected characters.
Wordclouds based on character wiki page & dialogoue
When comparing the generated wordclouds for the respective data sets it should be noted, that the same words are, for the most part, not present for the respective characters. This is expected as one would imagine that the text from the characters wikipedia pages are more descriptive of the character and their place in the story whereas the wordcloud from the dialogoue is exactly that; their most descrriptive words according to TF-IDC used throughout the series. This would be interesting to compare with sentiment analysis which is the second part of this page.
Wordclouds based on selected houses
Next, we will generate wordclouds based on the characters allegiance. This will be done by pooling the dialogoue text of characters belonging to the same allegiance together and, again, compute the respective TF-IDF score in order to generate the wordclouds. For this, we have selected the houses: Stark, Lannister, Targaryen, Greyjoy and the independent group The Night's Watch. It would be interesting to see, if the houses mottos would appear in these word clouds. The respective house mottos are:
House Stark: Winter is coming
House Lannister: Hear Me Roar!
House Targaryen: Fire and Blood
House Greyjoy: We Do Not Sow
As the Night's Watch is not a House but rather a brotherhood sworn to protect The Wall, they do not have a motto.
When looking at the wordclouds above and the respective house mottos, only the Lannisters' Hear (big, middle) are present. All the wordclouds are, however, very descriptive of the respective houses. For instance for the Night's Watch, a military order sworn to protect The Wall, words like protect, wildling and swear are present. The same can be said for House Targaryan, where the main Targaryan character, Daenerys, is married to a dothraki warlord and later in the show, is a leader of dothraki people herself.
Wordclouds based on seasons
We will now generate wordclouds based on the wiki pages' season sections. It would be interesting to see how these wordclouds change as the story unfolds. It would also be intersting to investigate whether the overall theme of the series changes during the series course and if this can be seen in the wordclouds.
Taking example in the wordclouds generated for season 1 & 8, the emphasized words seem very descriptive of their respective seasons. Starting with season 1:
- execute, behead : One of the main acts of season 1, is the execution of Lord Eddard Stark, the head of House Stark. He is, by the unexpected command of the king Joffrey Baratheon, beheaded in the middle of King's Landing.
- Khal, bloodrider : Another of the main story arcs, is the story of Daenarys Targaryan which takes place in a foreign land. In season 1, Daenarys is married of to a powerful Khal, Khal Drogo, in a trade by Daenarys brother. A Khal has three bloodriders who are to live and die by the life of their Khal. The words Khal and bloodrider being so prominent makes sense, as they are key roles in Daenarys' story arc.
Comparing the wordclouds of season 1 and season 8, it appears season 8 has different key words. For season 8:
-
celebrate : The word celebrate stands in stark constrast to the prominent words suffer from season 1. This could be due to season 8 being the series final season and it's characters are therefore celebrating the story ending on a happy note (for some of the characters
) - reunite : The story culminates in the final season, many characters who have been seperated throughout the show are finally reunited in the final season of the show, hence emphasis on the word reunite makes sense.
It should also be noted that the word destroy is present in the majority of the wordclouds, only being omitted in the wordclouds for season 1 and 3.
Sentiment of characters
In this second part of text analysis, we will do a sentiment analysis of the characters, again, based on both their wiki-pages and their dialogoue in the series. As we saw in the wordclouds of the selected characters, there was quite a difference in the wordclouds based on the respective wiki-pages and character dialogoue. It would be interesting to look at, if this also results in a different sentiment level of the character. Additionally, we will also do a sentiment analysis of the different seasons of the series. Perhaps it can be determined if any of the seasons were significantly different on a sentiment based level.
For the sentiment analysis, we will apply both the dictionary based method of LabMT and the rule- and dictionary-based method of VADER. For further explanation of how these sentiment scores are computed and the difference between the two methods, we again refer to the Explainer Notebook. It should be noted that the score of the two methods differ, as the LabMT score sentiment on a scale from [1:9], while VADER scores on the range [-4:4]. For LabMT, a score of 5 is considered neutral while a score within the range [-0.05:0.05] is considered neutral for VADER.
Sentiment analysis of character dialogoue
In this subsection we are going to investigate the sentiment of characters based on their dialogoue which is based on transcripts. This is based on all dialogoue across all seasons as this is expected to give a better overview of each character sentiments.
The figure below presents the sentiment of the 10 happiest and 10 sadest characters. To the left the sentiment are based on LabMT whereas the figure to the right is based on VADER.
It should be noted that the two methods does not completely agree, but some characters are present in both results such as: Daisy, Pyat Pree, Olyvar and Matthos Seaworth are in top 10 of the happiest character in both results. Also some characters are present in both lists presenting the sadest characters such as Gregor Clegane.
The happiest characters appear to be quite happy based on the VADER and LabMT score as the score only goes to 1 for VADER and 9 for LabMT and the same for saddest characters.